Safe HTML checker
23rd February 2003
I’ve finally enabled a subset of HTML in my comments. In doing so, I had several requirements that needed to be fulfilled:
- Entered markup must be valid to XHTML strict, to stop comments form breaking validation and keep things nice and tidy.
- No presentational markup! I want to maintain control over how things look via my stylesheets—comments posted should only be able to use structural HTML elements.
- Attributes should be restricted to those that add semantic meaning. Javascript event attributes and CSS related attributes should not be allowed.
- I should retain full control over the tags and attributes allowed in the comments.
- Submitted HTML must be kept free from anything that could pose a security risk, such as
javascript:URLs.
The system I have implemented works by running submitted posts through an XML parser, which checks that each element is in my list of allowed elements, is nested correctly (you can’t put a blockquote inside a p for example) and doesn’t have any illegal attributes. My initial test have shown it to work pretty well, but if anyone wants to have a go at breaking it please, be my guest.
The code for the main class is available here: SafeHtmlChecker.class.php
More recent articles
- Olmo 3 is a fully open LLM - 22nd November 2025
- Nano Banana Pro aka gemini-3-pro-image-preview is the best available image generation model - 20th November 2025
- How I automate my Substack newsletter with content from my blog - 19th November 2025